{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
    "\n",
    "<i>Licensed under the MIT License.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image Similarity Export\n",
    "\n",
    "In the Scenario->Image Similarity notebook [12_fast_retrieval.ipynb](../../../scenarios/similarity/12_fast_retrieval.ipynb) we implemented the approximate nearest neighbor search method to find similar images from a group of reference images, given a query input image. This notebook repeats some of those steps with the goal of exporting computed reference image features to text file for use in visualizing the results in an HTML web interface. \n",
    "\n",
    "To be able to test the model in a simple HTML interface, we export: the computed reference image features, a separate text file of reference image file names, and thumbnail versions of the reference images. The first two files are initially exported as text files then compressed into zip files to minimuze file size. The reference images are converted to 150x150 pixel thumbnails and stored in a flat directory. All exports are saved to the UICode folder. Notebook **2_upload_ui** is used to upload the exports to your Azure Blob storage account for easy public access. \n",
    "\n",
    "It is assumed you already completed the steps in notebook [12_fast_retrieval.ipynb](../../../scenarios/similarity/12_fast_retrieval.ipynb) and have deployed your query image processing model to an Azure ML resource (container services, Kubernetes services, ML web app, etc.) with a queryable, CORS-compliant API endpoint."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ensure edits to libraries are loaded and plotting is shown in the notebook.\n",
    "%matplotlib inline\n",
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Standard python libraries\n",
    "import sys\n",
    "import os\n",
    "import numpy as np\n",
    "from pathlib import Path\n",
    "import random\n",
    "import scrapbook as sb\n",
    "from sklearn.neighbors import NearestNeighbors\n",
    "from tqdm import tqdm\n",
    "import zipfile\n",
    "from zipfile import ZipFile\n",
    "\n",
    "# Fast.ai\n",
    "import fastai\n",
    "from fastai.vision import (\n",
    "    load_learner,\n",
    "    cnn_learner,\n",
    "    DatasetType,\n",
    "    ImageList,\n",
    "    imagenet_stats,\n",
    "    models,\n",
    "    PIL\n",
    ")\n",
    "\n",
    "# Computer Vision repository\n",
    "sys.path.extend([\".\", \"../../..\"])  # to access the utils_cv library\n",
    "from utils_cv.classification.data import Urls\n",
    "from utils_cv.common.data import unzip_url\n",
    "from utils_cv.common.gpu import which_processor, db_num_workers\n",
    "from utils_cv.similarity.metrics import compute_distances\n",
    "from utils_cv.similarity.model import compute_features_learner\n",
    "from utils_cv.similarity.plot import plot_distances, plot_ranks_distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fast.ai version = 1.0.57\n",
      "Cuda is not available. Torch is using CPU\n"
     ]
    }
   ],
   "source": [
    "print(f\"Fast.ai version = {fastai.__version__}\")\n",
    "which_processor()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data preparation\n",
    "We start with parameter specifications and data preparation. We use the *Fridge objects* dataset, which is composed of 134 images, divided into 4 classes: can, carton, milk bottle and water bottle. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Data location\n",
    "DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n",
    "\n",
    "# Image reader configuration\n",
    "BATCH_SIZE = 16\n",
    "IM_SIZE = 300\n",
    "\n",
    "# Number of comparison of nearest neighbor versus exhaustive search for accuracy computation\n",
    "NUM_RANK_ITER = 100\n",
    "\n",
    "# Size of thumbnail images in pixels\n",
    "MAX_SIZE = (150, 150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set: 27 images, validation set: 107 images\n"
     ]
    }
   ],
   "source": [
    "# Load images into fast.ai's ImageDataBunch object\n",
    "random.seed(642)\n",
    "data = (\n",
    "    ImageList.from_folder(DATA_PATH)\n",
    "    .split_by_rand_pct(valid_pct=0.8, seed=20)\n",
    "    .label_from_folder()\n",
    "    .transform(size=IM_SIZE)\n",
    "    .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())\n",
    "    .normalize(imagenet_stats)\n",
    ")\n",
    "print(f\"Training set: {len(data.train_ds.x)} images, validation set: {len(data.valid_ds.x)} images\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load model\n",
    "\n",
    "Below we load a [ResNet18](https://arxiv.org/pdf/1512.03385.pdf) CNN from fast.ai's library which is pre-trained on ImageNet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn = cnn_learner(data, models.resnet18, ps=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, one can load a model which was trained using the [01_training_and_evaluation_introduction.ipynb](01_training_and_evaluation_introduction.ipynb) notebook using these lines of code:\n",
    "```python\n",
    "    learn = load_learner(\".\", 'image_similarity_01_model')\n",
    "    learn.data = data\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature extraction\n",
    "\n",
    "We now compute the DNN features for each image in our validation set. We use the output of the penultimate layer as our image representation, which, for the Resnet-18 model has a dimensionality of 512 floating point values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n"
     ]
    }
   ],
   "source": [
    "# Use penultimate layer as image representation\n",
    "embedding_layer = learn.model[1][-2] \n",
    "print(embedding_layer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computed DNN features for the 107 validation images,each consisting of 512 floating point values.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Compute DNN features for all validation images\n",
    "valid_features = compute_features_learner(data, DatasetType.Valid, learn, embedding_layer)\n",
    "print(f\"Computed DNN features for the {len(list(valid_features))} validation images,\\\n",
    "each consisting of {len(valid_features[list(valid_features)[0]])} floating point values.\\n\")\n",
    "\n",
    "# Normalize all reference features to be of unit length\n",
    "valid_features_list = np.array(list(valid_features.values()))\n",
    "valid_features_list /= np.linalg.norm(valid_features_list, axis=1)[:,None]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export for HTML Interface\n",
    "\n",
    "Here we package all of the data for upload to Blob Storage to interact with the model in a simple HTML interface. \n",
    "\n",
    "First, we export the computed reference features to ZIP file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "f = open(\"ref_features.txt\", 'w')\n",
    "f.write('[')\n",
    "f.writelines('],\\n'.join('[' + ','.join(map(str,i)) for i in valid_features_list))\n",
    "f.write(']]')\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we export the reference image file names to disk. Exported file names will include the parent directory name as well.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open(\"ref_filenames.txt\", 'w')\n",
    "f.write('[\"')\n",
    "f.writelines('\",\\n\"'.join((i[len(DATA_PATH)+1:]).replace(\"/\",\"_\").replace(\"\\\\\",\"_\") for i in valid_features.keys()))\n",
    "f.write('\"]')\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we compress the exported text data into Zip files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Writing files to zipfiles, one by one \n",
    "with ZipFile('ref_features.zip','w', zipfile.ZIP_DEFLATED) as zip: \n",
    "    zip.write(\"ref_features.txt\")\n",
    "with ZipFile('ref_filenames.zip','w', zipfile.ZIP_DEFLATED) as zip: \n",
    "    zip.write(\"ref_filenames.txt\")\n",
    "    \n",
    "# Remove the txt files\n",
    "os.remove(\"ref_features.txt\")\n",
    "os.remove(\"ref_filenames.txt\")\n",
    "\n",
    "# Make subfolder to hold all HTML Demo files and a subfolder for the zip files\n",
    "if not os.path.exists('../UICode'):\n",
    "    os.makedirs('../UICode')\n",
    "\n",
    "if not os.path.exists('../UICode/data'):\n",
    "    os.makedirs('../UICode/data')\n",
    "    \n",
    "# Move the zip files to the new directory\n",
    "os.replace(\"ref_features.zip\", \"../UICode/data/ref_features.zip\")\n",
    "os.replace(\"ref_filenames.zip\", \"../UICode/data/ref_filenames.zip\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we resize the reference images to 150x150 pixel thumbnails in a new directory called 'small-150'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make subfolder to hold all HTML Demo files and a subfolder for the zip files\n",
    "if not os.path.exists('../UICode/small-150'):\n",
    "    os.makedirs('../UICode/small-150')\n",
    "\n",
    "path_mr = '../UICode/small-150'\n",
    "\n",
    "# Now resize the images to thumbnails\n",
    "for root, dirs, files in os.walk(DATA_PATH):\n",
    "    for file in files:\n",
    "        if file.endswith(\".jpg\"):\n",
    "            #fname = path_mr +'/' + root[len(DATA_PATH)+1:] + '_' + file\n",
    "            fname = os.path.join(path_mr, root[len(DATA_PATH)+1:] + '_' + file) \n",
    "            im = PIL.Image.open(os.path.join(root, file))\n",
    "            im.thumbnail(MAX_SIZE) \n",
    "            im.save(fname, 'JPEG', quality=70)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.6.9 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
